Search Result

Select

Compilation optimizations for inconsistent control flow on deep computer unit

Xiaoyi YANG, Rongcai ZHAO, Hongsheng WANG, Lin HAN, Kunkun XU

Journal of Computer Applications 2023, 43 (10): 3170-3177. DOI: 10.11772/j.issn.1001-9081.2022091338

Abstract （183）

HTML （10）

PDF （4315KB）（80）

Save

The domestic DCU （Deep Computer Unit） adopts the parallel execution model of Single Instruction Multiple Thread （SIMT）. When the programs are executed， inconsistent control flow is generated in the kernel function， which causes the threads in the warp be executed serially. And that is warp divergence. Aiming at the problem that the performance of the kernel function is severely restricted by warp divergence， a compilation optimization method to reduce the warp divergence time — Partial-Control-Flow-Merging （PCFM） was proposed. Firstly， divergence analysis was performed to find the fusible divergent regions that are isomorphic and contained a large number of same instructions and similar instructions. Then， the fusion profit of the fusible divergent regions was evaluated by counting the percentage of instruction cycles saved after merging. Finally， the alignment sequence was searched， the profitable fusible divergent regions were merged. Some test cases from Graphics Processing Unit （GPU） benchmark suite Rodinia and the classic sorting algorithm were selected to test PCFM on DCU. Experimental results show that PCFM can achieve an average speedup ratio of 1.146 for the test cases. And the speedup of PCFM is increased by 5.72% compared to that of the branch fusion + tail merging method. It can be seen that the proposed method has a better effect on reducing warp divergence.

Table and Figures | Reference | Related Articles | Metrics

Select

Ensemble Extreme Learning Machine Based on the Members Similarity

YE Songlin HAN Fei ZHAO Minru

Journal of Computer Applications 2014, 34 (4): 1089-1093. DOI: 10.11772/j.issn.1001-9081.2014.04.1089

Abstract （467）

PDF （753KB）（360）

Save

To increase the diversity among the selected members to enhance the performance of the ensemble system, an ensemble Extreme Learning Machine (ELM) based on the selection of members similarity named EELMBSMS was proposed. Firstly, some candidate ELMs with high classification ability were selected. Then, Particle Swarm Optimization (PSO) algorithm was used to select the optimal subset of the ensemble members according to the similarity among the members. The diversity of the selected members was improved by selecting those ELMs with low similarity, which improved the classification performance of the ensemble system effectively. The selected ELMs obtained better performance with different integration rules. The experimental results on four UCI datasets verify that EELMBSMS has better stability and better generalization than some classical ensemble extreme learning machines.

Reference | Related Articles | Metrics

Select

Universal designated verifier signcryption scheme in standard model

MING Yang ZHANG Lin HAN Juan ZHOU Jun

Journal of Computer Applications 2014, 34 (2): 464-468.

Abstract （389）

PDF （702KB）（367）

Save

Concerning the signature security problem in reality, based on the Waters'technology, a universal designated verifier signcryption scheme in the standard model was proposed. Signcryption is a cryptographic primitive which performs encryption and signature in a single logical step. Universal designated verifier signature allowed a signature holder who had a signature of a signer, to convince a designated verifier that he was in possession of a signer's signature, while the verifier could not transfer such conviction to anyone else, only allowed the designated verifier to verify the existence of the signature. The scheme by combining universal designated verifier and signcryption eliminated the signer and signture holders for signature transmission required for a secure channel. Under the assumption of Computational Bilinear Diffie-Hellman (CBDH) problem, the scheme was proved to be safe. Compared with the existing schemes, the proposed scheme has better computational efficiency.